Curb
An Online Algorithm to Detect and Reduce the Spread of Fake News and Misinformation in Social Networks
- Presented at WSDM, 2018;
- Full paper at arxiv.org/abs/1711.09918
- Source code at Networks-Learning/Curb
Introduction
In recent years, social media and online social networking sites have become a major disseminator of false facts, urban legends, fake news, or, more generally, misinformation. In this context, there are growing concerns that misinformation on these platforms has fueled the emergence of a post-truth society, where debate is perniciously framed by the repeated assertion of talking points to which factual rebuttals by the media or independent experts are ignored.
Crowd-Powered Reduction of Misinformation
In an effort to curb the spread of misinformation, major online social networking sites, such as Facebook, Twitter or Weibo, are (considering) resorting to the crowd See News Feed FYI: Addressing Hoaxes and Fake News, Twitter is looking for ways to let users flag fake news, offensive content, How China’s highly censored WeChat and Weibo fight fake news ... and other controversial content They allow their users to have a choice to flag the story as misinformation. If the story receives enough flags, it is directed to a coalition of independent organizations for fact-checking. This coalition of organizations includes, among many others, Snopes, FactCheck, Politifact, and signatories of Poynter's International Fact Checking Code of Principles. If the fact-checking organizations identify a story as misinformation, it gets flagged as disputed and may also appear lower in the users' feeds, reducing the number of people who are exposed to misinformation.
When-to-Fact-Check Problem
Following the procedure above, the third-party fact-checking is costly. Therefore, in this work, we aim to optimize the procedure by deciding which stories to fact-check and when to do so. A desirable model for the fact-checking problem would take into account the probability of a story being misinformation and only fact-check stories that are likely to be misinformation and spreading rapidly on the network. At the same time, the model would not waste resources on stories that have already been fact-checked. To develop the model that allows the efficient fact-checking procedure, we use the framework of marked temporal point processes and solve a novel stochastic optimal control problem for stochastic differential equations with jumps to derive an optimal policy for fact-checking stories in online social networks.
We start by characterizing the number of exposures to an unverified story using a counting process denoted by \(N_s^e(t)\) with associated intensity denoted by \(\lambda^{e}_s(t)\) which characterizes the expected number of exposures at any given time \(t\). Similarly, \(N_s^f(t)\) denotes the number of a subset of exposures that involves flags by users. Then we can compute the average number of users exposed to misinformation by time \(t\), denoted by \(\bar{N}^{m}_s(t)\), as: $$ \begin{align} \bar{N}^{m}_s(t) &:= {\color{misinformation}p_{m | f=1} N^f_s(t) + p_{m | f=0} ( N^{e}_s(t) - N^f_s(t) )}, \end{align} $$ where \(p_{m | f=1}\) and \(p_{m | f=0}\) denote the conditional probability of a story being misinformation given a flag and the conditional probability of a story being misinformation given no flag.
\(\bar{N}^{m}_s(t)\) is also a point process with intensity function \(\hat{\lambda}_s^m \): $$ \begin{align} \hat{\lambda}_s^m dt =\mathbb{E}_{f_s(t), f_s}[d\bar{N}^{m}_{s}(t)] = \mathbb{E}_{f_s(t), f_s} {\color{misinformation}\Big [ (p_{m | f=1} - p_{m | f=0})f_s(t) + p_{m | f=0} \Big ]} {\color{spread}\lambda^{e}_s(t)} dt. \end{align} $$ In this equation, \(f_s(t)\) checks if there is a flag at time \(t\). The probability of flagging a story at time \(t\) depends on whether the story is misinformation or not, which we do not know before we fact-check them. Therefore, we try to estimate this probability using available data and Bayesian statistics. In our formulation we assumed \(Beta(\alpha,\beta)\) prior over \(f_s(t)\) and used posterior predictive distribution to estimate the rate of spread of misinformation at any given time.
Now we can characterize the problem of fact-checking using above notations as follows: $$ \begin{align} \label{eq:problem} & \underset{u(t_0, t_f]}{\text{minimize}} \quad \,\, \mathbb{E} \left[ \phi( \hat{\lambda}^m(t_f) ) + \int_{t_0}^{t_f}{ \ell ( \hat{\lambda}^m(\tau), u(\tau))d\tau } \right] \nonumber \\ & \text{subject to} \quad u(t) \geq 0 \quad \forall t \in (t_0, t_f], \end{align} $$ where \(u(t)\) is the intensity for the fact-checking scheduling problem that we optimize, \(\phi(\cdot)\) is an arbitrary penalty function, and \(\ell(\cdot,\cdot)\) is a loss function which depend on the expected intensity of the spread of misinformation \(\hat{\lambda}^m(t_f)\) and the fact-checking intensity, \(u(t)\). In this work, we assumed the following quadratic forms for \(\phi(\cdot)\) and \(\ell(\cdot,\cdot)\): $$ \begin{align*} \phi(\hat{\lambda}^m(t_f)) &= \frac{1}{2}{ (\hat{\lambda}^{m}(t_f))^2} \\ \ell(\hat{\lambda}^m(t), u(t)) &= \frac{1}{2}{ (\hat{\lambda}^{m}(t))^2 } + \frac{1}{2}{ q u^{2} (t) }, \end{align*} $$ where \(q\) is a parameter indicating the trade-off between fact-checking and spread of misinformation. Smaller values indicate more resources available for fact-checking and higher values indicate less available resources.
Optimal Solution for the Fact-Checking Problem
Using the above formulation, we can find the optimal when-to-fact-check problem, using Stochastic Optimal Control formalism. The resulting intensity for fact-checking takes the following form: $$ u^{*}(t) = q^{- \frac{1}{2}} {\color{checked}(1-M(t))} {\color{misinformation}\Big [ (p_{m | f=1} - p_{m | f=0} ) \left( \frac{\alpha + N^f(t)}{\alpha + \beta + N^{e}(t)} \right) + p_{m | f=0} \Big ]}{\color{spread} \lambda^{e}(t)}. $$ This optimal intensity function depends on three factors that satisfy the requirements we set out at the beginning:
- whether the story has already been fact-checked or not,
- probability of a story being misinformation,
- rate at which the story is spreading in the network.
More details
Our paper contains a detailed mathematical description of Curb, and several extensions including, but not limited to:
- Description of the generative process associated to exposure to misinformation.
- Extension to handle multiple stories.
- Experiments with two datasets from Twitter and Weibo, demonstrating the efficacy of Curb compared to several common baselines.
Authors
Authors of Curb are Jooyeon Kim, Behzad Tabibian, Alice Oh, Bernhard Schölkopf and Manuel Gomez-Rodriguez.
Acknowledgements
The Curb icon was made by Freepik from www.flaticon.com. It is licensed by CC 3.0 BY.